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Abstract. Functional verification constitutes one of the most challeng- 
ing tasks in the development of modern hardware systems, and simulation- 
based verification techniques dominate the functional verification land- 
^> scape. A dominant paradigm in simulation-based verification is directed 

^^ random testing, where a model of the system is simulated with a set 

^^ of random test stimuli that are uniformly or near-uniformly distributed 

over the space of all stimuli satisfying a given set of constraints. Uni- 

^^ form or near-uniform generation of solutions for large constraint sets is 

\^ therefore a problem of theoretical and practical interest. For boolean 

l_^ constraints, prior work offered heuristic approaches with no guarantee 

ryi^ of performance, and theoretical approaches with proven guarantees, but 

^ poor performance in practice. We offer here a new approach with theo- 

' ' retical performance guarantees and demonstrate its practical utility on 

, large constraint sets. 

> 

00 1 Introduction 

in 

Functional verification constitutes one of the most challenging tasks in the de- 
\1 velopment of modern hardware systems. Despite significant advances in for- 

mal verification over the last few decades, there is a huge mismatch between 
the sizes of industrial systems and the capabilities of state-of-the-art formal- 
r* verification tools [6] . Simulation-based verification techniques therefore dominate 

. !^ the functional- verification landscape [5] . A dominant paradigm in simulation- 

j^ based verification is directed random testing. In this paradigm, an operational 

H (usually, low-level) model of the system is simulated with a set of random test 

stimuh satisfying a set of constraints |7|18|23) . The simulated behavior is then 
compared with the expected behavior, and any mismatch is flagged as indica- 
tive of a bug. The constraints that stimuli must satisfy typicaUy arise from 



The final version will appear in the Proceedings of CAV'13 and will be available 
at [lin k .springer .com Work supported in part by NSF grants CNS 1049862 and 
CCF-1139011, by NSF Expeditions in Computing project "ExCAPE: Expeditions in 
Computer Augmented Program Engineering", by BSF grant 9800096, by gift from 
Intel, by a grant from Board of Research in Nuclear Sciences, India, and by the 
Shared University Grid at Rice funded by NSF under Grant EIA-0216467, and a 
partnership between Rice University, Sun Microsystems, and Sigma Solutions, Inc. 



various sources such as domain and application-specific knowledge, architec- 
tural and environmental requirements, specifications of corner-case scenarios, 
and the like. Test requirements from these varied sources are compiled into a 
set of constraints and fed to a constraint solver to obtain test stimuli. Develop- 
ing constraint solvers (and test generators) that can reason about large sets of 
constraints is therefore an extremely important activity for industrial test and 
verification applications |13j . 

Despite the diligence and insights that go into developing constraint sets for 
generating directed random tests, the complexity of modern hardware systems 
makes it hard to predict the effectiveness of any specific test stimulus. It is there- 
fore common practice to generate a large number of stimuli satisfying a set of 
constraints. Since every stimulus is a priori as likely to expose a bug as any 
other stimulus, it is desirable to sample the solution space of the constraints 
uniformly or near-uniformly (defined formally below) at random [18j . A naive 
way to accomplish this is to first generate all possible solutions, and then sample 
them uniformly. Unfortunately, generating all solutions is computationally pro- 
hibitive (and often infeasible) in practical settings of directed random testing. 
For example, we have encountered systems of constraints where the expected 
number of solutions is of the order of 2^"°, and there is no simple way of deriv- 
ing one solution from another. It is therefore interesting to ask: Given a set of 
constraints, can we sample the solution space uniformly or near-uniformly, while 
scaling to problem sizes typical of testing /verification scenarios? An affirmative 
answer to this question has implications not only for directed random testing, 
but also for other applications like probabilistic reasoning, approximate model 
counting and Markov logic networks |4I19| . 

In this paper, we consider Boolean constraints in conjunctive normal form 
(CNF), and address the problem of near-uniform generation of their solutions, 
henceforth called SAT Witnesses. This problem has been of long-standing theo- 
retical interest [20 21 . Industrial approaches to solving this problem either rely 
on ROBDD-based techniques [121 , which do not scale well (see, for example, the 
comparison in [T^ ) , or use heuristics that offer no guarantee of performance or 
uniformity when applied to large problem instancetrl Prior published work in this 
area broadly belong to one of two categories. In the first category |22)15|12fT6] . 
the focus is on heuristic sampling techniques that scale to large systems of con- 
straints. Monte Carlo Markov Chain (MCMC) methods and techniques based on 
random seedings of SAT solvers belong to this category. However, these methods 
either offer very weak or no guarantees on the uniformity of sampling (see [TBI for 
a comparison), or require the user to provide hard-to-estimate problem-specific 
parameters that crucially affect the performance and uniformity of sampling. 
In the second category of work |5I14I23J , the focus is on stronger guarantees of 
uniformity of sampling. Unfortunately, our experience indicates that these tech- 
niques do not scale even to relatively small problem instances (involving few tens 
of variables) in practice. 
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The work presented in this paper tries to bridge the above mentioned ex- 
tremes. Specifically, we provide guarantees of near-uniform sampling, and of a 
bounded probability of failure, without the user having to provide any hard-to- 
estimate parameters. We also demonstrate that our algorithm scales in practice 
to constraints involving thousands of variables. Note that there is evidence that 
uniform generation of SAT witnesses is harder than SAT solving 14]. Thus, 
while today's SAT solvers are able to handle hundreds of thousands of variables 
and more, we believe that scaling of our algorithm to thousands of variables is 
a major improvement in this area. Since a significant body of constraints that 
arise in verification settings and in other application areas (like probabilistic rea- 
soning) can be encoded as Boolean constraints, our work opens new directions 
in directed random testing and in these application areas. 

The remainder of the paper is organized as follows. In Section [2l we review 
preliminaries and notation needed for the subsequent discussion. In Section [3] 
we give an overview of some algorithms presented in earlier work that come close 
to our work. Design choices behind our algorithm, some implementation issues, 
and a mathematical analysis of the guarantees provided by our algorithm are 
discussed in Section |4] Section [5] discusses experimental results on a large set of 
benchmarks. Our experiments demonstrate that our algorithm is more efficient 
in practice and generates witnesses that are more evenly distributed than those 
generated by the best known alternative algorithm that scales to comparable 
problem sizes. Finally, we conclude in Section [6J 

2 Notation and Preliminaries 

Our algorithm can be viewed as an adaptation of the algorithm proposed by 
Bellare, Goldreich and Petrank [5] for uniform generation of witnesses for NV- 
relations. In the remainder of the paper, we refer to Bellare et al.'s algorithm 
as the BGP algorithm (after the last names of the authors). Our algorithm also 
has similarities with algorithms presented by Gomes, Sabharwal and Selman |12) 
for near-uniform sampling of SAT witnesses. We begin with some notation and 
preliminaries needed to understand these related work. 

Let E be an alphabet and R Q S* x S* he a. binary relation. We say that 
R is an A/''P-relation if R is polynomial-time decidable, and if there exists a 
polynomial p{-) such that for every (x,y) £ i?, we have \y\ < p{\x\). Let Lji be 
the language {x £ S* \ 3y e S* , {x,y) S R}. The language L^ is said to be in 
MV if R is an A/'P-relation. The set of all satisfiable prepositional logic formulae 
in CNF is known to be a language in AfV. Given x G Lr, a witness of x is a 
string y £ E* such that {x,y) € R. The set of all witnesses of x is denoted Rx- 
For notational convenience, let us fix E to be {0, 1} without loss of generality. 
If R is an A/'P-relation, we may further assume that for every x G L^, every 
witness y G Rx is in {0, 1}", where n = p{\x\) for some polynomial p{-). 

Given an AfV relation i?, a probabilistic generator of witnesses for i? is a 
probabilistic algorithm Q{-) that takes as input a string x € L_r and generates a 
random witness of x. Throughout this paper, we use Pr [X] to denote the prob- 



ability of outcome X of sampling from a probability space. A uniform generator 
^"(•) is a probabilistic generator that guarantees Pr[t/"(x) — y] = ^/\Rx\ for 
every witness y of a;. A near-uniform generator t/""(-) relaxes the guarantee of 
uniformity, and ensures that Pr [G^^{x) = y] > c- (l/|i?j;|) for a constant c, where 
< c < 1. Clearly, the larger c is, the closer a near-uniform generator is to be- 
ing a uniform generator. Note that near-uniformity, as defined above, is a more 
relaxed approximation of uniformity compared to the notion of "almost unifor- 
mity" introduced in |5I14| . In the present work, we sacrifice the guarantee of 
uniformity and settle for a near-uniform generator in order to gain performance 
benefits. Our experiments, however, show that the witnesses generated by our 
algorithm are fairly uniform in practice. Like previous work [5114) . we allow our 
generator to occasionally "fail" , i.e. the generator may occasionaly output no 
witness, but a special failure symbol _L. A generator that occasionally fails must 
have its failure probability bounded above by d, where d is a constant strictly 
less than 1. 

A key idea in the BGP algorithm for uniform generation of witnesses for 
j'V'P-relations is to use r-wise independent hash functions that map strings in 
{0,1}" to {0,1}™, for m < n. The objective of using these hash functions is 
to partition R^ with high probability into a set of "well-balanced" and "small" 
cells. We follow a similar idea in our work, although there are important dif- 
ferences. Borrowing related notation and terminology from [5], we give below a 
brief overview of r-wise independent hash functions as used in our context. 

Let n,m and r be positive integers, and let H(n,m,r) denote a family of 

r-wise independent hash functions mapping {0, 1}" to {0, 1}™. We use h i — 
H(n, m, r) to denote the act of choosing a hash function h uniformly at random 
from iJ(n, to, r). By virtue of r-wise independence, for each ai, . . . a^ S {0, 1}*^ 

and for each distinct j/i, . . . y^ G {0, 1}", Pr Al=i h{yi) — ai : h ^— H{n, to, r] 

n — mr 

For every a e {0, 1}™ and h € H{n,m,r), let h^^{a) denote the set {y G 
{0,1}" I h{y) — a}. Given R^ C {0,1}" and h G H{n,m,r), we use Rx,h,a to 
denote the set Rx n h~^{a). If we keep h fixed and let a range over {0, 1}"*, the 
sets Rx,h,a form a partition of R^- Following the notation of Bellare et al., we 
call each element of such a parition a cell of Rx induced by h. It has been argued 
in [5] that if h is chosen uniformaly at random from H{n,m,r) for r > 1, the 
expected size of Rx,h,a, denoted E [|-Ri;,h,Q|], is |i?x|/2™, for each a G {0, 1}™. 

In [S], the authors suggest using polynomials over finite fields to generate r- 
wise independent hash functions. We call these algebraic hash functions. Choos- 
ing a random algebraic hash function h G H{n, to, r) requires choosing a sequence 
(ao, . . . a^-i) of elements in the field F = GF(2™'^'^("'"")), where G'i^(2'=) denotes 
the Galois field of 2*^ elements. Given y G {0,1}", the hash value h{y) can be 
computed by interpretting y as an element of F, computing E^-^^ajy^ in F^ 
and selecting to bits of the encoding of the result. The authors of [5^ suggest 
polynomial-time optimizations for operations in the field F . Unfortunately, even 
with these optimizations, computing algebraic hash functions is quite expensive 
in practice when non-linear terms are involved, as in IJ^~Qajy^ , 



Our approach uses computationally efficient linear hash functions. As we 
show later, pairwise independent hash functions suffice for our purposes. The 
literature describes several families of efficiently computable pairwise indepen- 
dent hash functions. One such family, which we denote Hconvi'n-,m,2), is based 
on the wrapped convolution function [TTj. For a € {0, ij."+™~i and y S {0, 1}", 
the wrapped convolution c — (a»y) is defined as an element of {0, 1}™ as follows: 
for each i £ {1, . . . m}, c[i] = 0?^i(2/[j]Aa[«-|-j— 1]), where denotes logical xor 
and v[i] denotes the i*'* component of the bit-vector v. The family Hconv{n, fn, 2) 
is defined as {ha.b{y) = (a • y) ©„ & | a € {0, l}"+"'-\b e {0, 1}"}, where ©„ 
denotes componentwise xor of two elements of {0, 1}™. By randomly choosing a 
and b, we can randomly choose a function ha.b{x) from this family. It has been 
shown in T7" that i/com)('^i'7i, 2) is pairwise independent. Our implementation 
of a near- uniform generator of CNF SAT witnesses uses Hconvin, m, 2). 



3 Related Algorithms in Prior Work 

We now discuss two algorithms that arc closely related to our work. In 1998, 
Bellare et al. [5 proposed the BGP algorithm, showing that uniform generation 
of A/'T'-witnesses can be achieved in probabilistic polynomial time using an AfV- 
oracle. This improved on previous work by Jerrum, Valiant and Vazirani |14j . 
who showed that uniform generation can be achieved in probabilistic polynomial 
time using a iTj oracle, and almost- uniform generation (as de&ied in 14]) can 
be achieved in probabilistic polytime using an AfV oracle. 

Let R be an A/'T'-relation over S. The BGP algorithm takes as input an 
X G Lji and either generates a witness that is uniformly distributed in i?^., or 
produces a symbol _L (indicating a failed run). The pseudocode for the algorithm 
is presented below. In the presentation, we assume w.l.o.g. that n is an integer 
such that Rx C {0, 1}". We also assume access to A/'T'-oracles to answer queries 
about cardinalities of witness sets and also to enumerate small witness sets. 

Algorithm BGP(a;) : 

/* Assume R^ C {0, 1}" */ 

1: pivot ^ 2n^; 

2: if (|i?.| < pivot) 

3: List all elements yi, ■ ■ ■ y\R^\ of R^] 

4: Choose j at random from {1, . . . |-Ra;|}, and return yj\ 

5: else 

6: ? ^ 2[log2nl; i ^ ? - 1; 

7: repeat 

8: «4-i + l; 

9: Choose h at random from H{n^ i — I, n); 

10: until (Vq S {0, 1}*"', {R^^h.al < '^n^) or {i = n - 1); 

11: if (3a € {0, 1}'-', \Rx.h,a\ > 2n^) return _L; 

12: Choose a at random from {0, 1}*~'; 

13: List ah elements yi, . . . y\R^^^_^\ of Rx,h,a\ 
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Choose j at random from {1, . . . pivot}; 
if i < \Rx,h,a\, return yy, 
else return _L; 



For clarity of exposition, we have made a small adaptation to the algorithm origi- 
nally presented in [5]. Specifically, if h does not satisfy (Va € {0, 1}'~', \Rx,h,a\ < 
2n^) when the loop in lines 7-10 terminates, the original algorithm forces a spe- 
cific choice of h. Instead, algorithm BGP simply outputs _L (indicating a failed 
run) in this situation. A closer look at the analysis presented in [S] shows that 
all results continue to hold with this adaptation. The authors of [5] use algebraic 
hash functions and random choices of n-tuples in Gp(2™'^'^("'*~')) to implement 
the selection of a random hash function in line 9 of the pseudocode. The following 
theorem summarizes the key properties of the BGP algorithm J5J. 

Theorem 1. If a run of the BGP algorithm is successful, the probability that 
y e Rx is generated by the algorithm is independent of y. Further, the probability 
that a run of the algorithm fails is < 0.8. 

Since the probability of any witness y £ Rx being output by a successful run 
of the algorithm is independent of y, the BGP algorithm guarantees uniform 
generation of witnesses. However, as we argue in the next section, scaling the 
algorithm to even medium-sized problem instances is quite difficult in practice. 
Indeed, we have found no published report discussing any implementation of the 
BGP algorithm. 

In 2007, Gomes et al. [T2] presented two closely related algorithms named 
XORSample and XORSample' for near-uniform sampling of combinatorial spaces. 
A key idea in both these algorithms is to constrain a given instance F of the CNF 
SAT problem by a set of randomly selected xor constraints over the variables 
appearing in F . An xor constraint over a set V of variables is an equation of the 
form e = c, where c S {0, 1} and e is the logical xor of a subset of y . A probability 
distribution X(|y|,q) over the set of all xor constraints over V is characterized 
by the probability q of choosing a variable in V . A random xor constraint from 
X(|y|,(7) is obtained by forming an xor constraint where each variable in V is 
chosen independently with probability q, and c is chosen uniformly at random. 

We present the pseudocode of algorithm XORSample' below. The algorithm 
uses a function SATModelCount that takes a Boolean formula F and returns the 
exact count of witnesses of F. Algorithm XORSample' takes as inputs a CNF 
formula F , the parameter q discussed above and an integer s > 0. Suppose the 
number of variables in F is n. The algorithm proceeds by conjoining s xor con- 
straints to F, where the constraints are chosen randomly from the distribution 
X(n, q). Let F' denote the conjunction of F and the random xor constraints, and 
let mc denote the model count (i.e., number of witnesses) of F' . If mc > 1, the 
algorithm enumerates the witnesses of F' and chooses one witness at random. 
Otherwise, the algorithm outputs _L, indicating a failed run. 

Algorithm XORSample'(i^,g, s) 



/* n — Number of variables in F */ 



Qs ^~ {s random xor constraints from X(n, q)}\ 

mc 4- SATModelCount(F'); 
if {mc > 1) 

Choose i at random from {1, . . . mc}; 

List the first i witnesses of F'; 

return i*'' witness of F'; 
else return _L; 



Algorithm XORSample can be viewed as a variant of algorithm XORSample' in 
which we check if mc is exactly 1 (instead of mc > 1) in line 4 of the pseudocode. 
An additional difference is that if the check in line 4 fails, algorithm XORSample 
starts afresh from line 1 by randomly choosing s xor constraints. In our ex- 
periments, we observed that XORSample' significantly outperforms XORSample 
in performance, hence we consider only XORSample for comparison with our 
algorithm. The following theorem is proved in [12, 

Theorem 2. Let F be a Boolean formula with 2" solutions. Let a be such 
that < a < s* and s ~ s* — a. For a witness y of F , the probability with 
which XORSample' with parameters q ~ -^ and s outputs y is bounded below by 

c'(a)2^'' , where c'{a) ~ ., , T-awiio-c/ai ■ Further, XORSample' succeeds with 
probability larger than cJ{a). 

While the choice oi q = ^ allowed the authors of [H] to prove Theorem m the 
authors acknowledge that finding witnesses of F' is quite hard in practice when 
random xor constraints are chosen from X(n, i). Therefore, they advocate using 
values of q much smaller than i. Unfortunately, the analysis that yields the 
theoretical guarantees in Theorem [2] does not hold with these smaller values of 
q. This illustrates the conflict between witness generators with good performance 
in practice, and those with good theoretical guarantees. 

4 The UniWit Algorithm: Design and Analysis 

We now describe an adaptation, called UniWit, of the BGP algorithm that scales 
to much larger problem sizes than those that can be handled by the BGP al- 
gorithm, while weakening the guarantee of uniform generation to that of near- 
uniform generation. Experimental results, however, indicate that the witnesses 
generated by our algorithm are fairly uniform in practice. Our algorithm can 
also be viewed as an adaptation of the XORSample' algorithm, in which we do 
not need to provide hard-to-estimate problem-specific parameters like s and q. 
We begin with some observations about the BGP algorithm. In what follows, 
line numbers refer to those in the pseudocode of the BGP algorithm presented in 
Section |3] Our first observation is that the loop in lines 7-10 of the pseudocode 
iterates until either |i?a:,/t,Q| £ 2n^ for every a G {0,1}*"' or i increments to 
n— 1. Checking the first condition is computationally prohibitive even for values 
oi i — I and n as small as few tens. So we ask if this condition can be simplified, 



perhaps with some weakening of theoretical guarantees. Indeed, we have found 
that if the condition requires that 1 < \Rx,h,a\ < 2n^ for a specific a e {0, 1}'^' 
(instead of for every a g {0, 1}*~'), we can stiU guarantee near-uniformity (but 
not uniformity) of the generated witnesses. This suggests choosing both a random 
h E H{n, i — l,n) and a random a G {0, 1}*^' within the loop of lines 7-10. 

The analysis presented in [5] relies on h being sampled uniformly from a 
family of n-wise independent hash functions. In the context of generating SAT 
witnesses, n denotes the number of propositional variables in the input formula. 
This can be large (several thousands) in problems arising from directed random 
testing. Unfortunately, implementing n-wise independent hash functions using 
algebraic hash functions (as advocated in [S]) for large values of n is computa- 
tionally infeasible in practice. This prompts us to ask if the BGP algorithm can 
be adapted to work with r-wise independent hash functions for small values of 
r, and if simpler families of hash functions can be used. Indeed, we have found 
that with r > 2, an adapted version of the BGP algorithm can be made to gen- 
erate near-uniform witnesses. We can also bound the probability of failure of the 
adapted algorithm by a constant. Significantly, the sufficiency of pairwise inde- 
pendence allows us to use computationally efficient xor-based families of hash 
functions, like Hconv{n,m,2) discussed in Section^ This provides a significant 
scaling advantage to our algorithm vis-a-vis the BGP algorithm in practice. 

In the context of uniform generation of SAT witnesses, checking if |i?2: | < 2n^ 
(hne 2 of pseudocode) or if \Rx,h,a\ < 2n^ (line 10 of pseudocode, modified as 
suggested above) can be done either by approximate model-counting or by re- 
peated invokations of a SAT solver. State-of-the-art approximate model counting 
techniques [llj rely on randomly sampling the witness space, suggesting a cir- 
cular dependency. Hence, we choose to use a SAT solver as the back-end engine 
for enumerating and counting witnesses. Note that if h is chosen randomly from 
-f^coni>('T'j "^i 2), the formula for which we seek witnesses is the conjunction of 
the original (CNF) formula and xor constraints encoding the inclusion of each 
witness in h^^{a). We therefore choose to use a SAT solver optimized for con- 
junctions of xor constraints and CNF clauses as the back-end engine; specifically, 
we use CryptoMiniSAT (version 2.9.2) [Jj. 

Modern SAT solvers often produce partial assignments that specify values 
of a subset of variables, such that every assignment of values to the remaining 
variables gives a witness. Since we must find large numbers (2n^ w 2 x 10® if 
n « 1000) of witnesses, it would be useful to obtain partial assignments from 
the SAT solver. Unfortunately, conjoining random xor constraints to the original 
formula reduces the likelihood that large sets of witnesses can be encoded as 
partial assignments. Thus, each invokation of the SAT solver is likely to generate 
only a few witnesses, necessitating a large number of calls to the solver. To make 
matters worse, if the count of witnesses exceeds 2n^ and if i < n— 1, the check in 
line 10 of the pseudocode of algorithm BGP (modified as suggested above) fails, 
and the loop of lines 7-10 iterates once more, requiring generation of up to 2n^ 
witnesses of a modified SAT problem all over again. This can be computationally 
prohibitive in practice. Indeed, our implementation of the BGP algorithm with 



CryptoMiniSAT failed to terminate on formulas with few tens of variables, even 
when running on high-performance computers for 20 hours. This prompts us to 
ask if the required number of witnesses, or pivot, in the BGP algorithm (see line 
1 of the pseudocode) can be reduced. We answer this question in the affirmative, 
and show that the pivot can indeed be reduced to 2n^'''^, where k is an integer 
> 1. Note that if fc = 3 and n = 1000, the value of 2ni/'= is only 20, while 2^2 
equals 2 X 10®. This translates to a significant leap in the sizes of problems for 
which we can generate random witnesses. There are, however, some practical 
tradeoffs involved in the choice of fc; we defer a discussion of these to a later part 
of this section. 

We now present the UniWit algorithm, which implements the modifications 
to the BGP algorithm suggested above. UniWit takes as inputs a CNF formula F 
with n variables, and an integer k > I. The algorithm either outputs a witness 
that is near- uniformly distributed over the space of all witnesses of F or produces 
a symbol _L indicating a failed run. We also assume that we have access to a 
function BoundedSAT that takes as inputs a propositional formula F that is a 
conjunction of a CNF formula and xor constraints, and an integer r > and 
returns a set S of witnesses of F such that \S\ = min(r, #i^), where #F denotes 
the count of all witnesses of F. 

Algorithm UniWit(F, fc): 

/* Assume zi, . . .Zn are variables in F */ 

/* Choose a priori the family of hash functions H{n, m,r),r > 2 to he used */ 

1: pivot ^ [2ni/'=]; S ^ BoundedSAT(F, pivot + 1); 

2: if (IS"! < pivot) 

3: Let yi, ■ ■ ■ y\s\ be the elements of S; 

4: Choose j at random from {1, . . . \S\} and return yj-, 

5: else 

6: l^[j:-{log^n)\;i^l-l; 

7: repeat 

8: i ^ i + 1; 

9: Choose h at random from H{n, i — l,r); 

10: Choose a at random from {0, 1}*^'; 

11: S ^ BoundedSAT(i^A {h{zi,...Zn) = a),pivot + l); 

12: until (1 < l^l < pivot) or {i = n); 

13: if (l^l > pivot) or (IS"] < 1) return _L; 

14: else 

15: Let yi, . . . y\g\ be the elements of S; 

16: Choose j at random from {1, . . . pivot};. 

17: if j < 1 5*1, return yj; 

18: else return _L; 

Implementation issues: There are four steps in UniWit (lines 4, 9, 10 and 16 
of the pseudocode) where random choices are made. In our implementation, in 
line 10 of the pseudocode, we choose a random hash function from the family 



Hconv{n,i — 1,2), since it is computationally efficient to do so. Recall from Sec- 
tionpltliat choosing a random hash function from H^onvin, Tn, 2) requires choos- 
ing two random bit-vectors. It is straightforward to implement these choices 
and also the choice of a random a € {0, 1}*~' in line 10 of the pseudocode, if 
we have access to a source of independent and uniformly distributed random 
bits. In lines 4 and 16, we must choose a random integer from a specified range. 
By using standard techniques (see, for example, the discussion on coin tossing 
in [5]), this can also be implemented efficiently if we have access to a source of 
random bits. Since accessing truly random bits is a practical impossibility, our 
implementation uses pseudorandom sequences of bits generated from nuclear de- 
cay processes and available at HotBits [2] . We download and store a sufficiently 
long sequence of random bits in a file, and access an appropriate number of bits 
sequentially whenever needed. 

In line 11 of the pseudocode for UniWit, we invoke BoundedSAT with argu- 
ments F A {h{zi, . . . Zn) = a) and pivot -f 1. The function BoundedSAT is imple- 
mented using CryptoMiniSAT (version 2.9.2), which allows passing a parameter 
indicating the maximum number of witnesses to be generated. The sub-formula 
{h{zi, . . . Zn) = a) is constructed as follows. As mentioned in Sectional a ran- 
dom hash function from the family Hconvii^ji ~ ^;2) can be implemented by 
choosing a random a e {0, i}"+«~'"i and a random b E {0, 1}'^'. Recalling the 
definition of h from Sectional the sub-formula (h(zi, . . .z„) = a) is given by 

A;=i ((e;=i(^P A a[j + p - 1]) © b[j]) ^ a[j]) . 

Analysis of UniWit: Let Rp denote the set of witnesses of the input formula F. 
Using notation discussed in Section [2J suppose Rp ^ {0, 1}". For simplicity of 
exposition, we assume that logj \Rf\ — (1/^) ■ log2 n is an integer in the following 
discussion. A more careful analysis removes this assumption with constant factor 
reductions in the probability of generation of an arbitrary witness and in the 
probability of failure of UniWit. 

Theorem 3. Suppose F has n variables and n > 2^. For every witness y of F, 
the conditional probability that algorithm UniWit outputs y on inputs F and k, 
given that the algorithm succeeds, is bounded below by „, „ , . 

Proof. Referring to the pseudocode of UniWit, if \Rf\ < 2n^^'', the theorem 
holds trivially. Suppose \Rf\ > 2vt}-l^ , and let Y denote the event that witness 
y in Rp is output by UniWit on inputs F and k. Let pi^y denote the probability 
that the loop in lines 7-12 of the pseudocode terminates in iteration i with y 
in Rp.h.a, where a G {0, 1}'^' is the value chosen in line 10. It follows from the 
pseudocode that Pr [Y] > pi^y ■ (l/2n^/'^), for every i G {l,- ■ -n}. Let us denote 
log2|i?i^|-(l/fc)-log2nbym. Therefore, 2"- ni/'= = |i?i.|. Since 2^1/'= < \Rf\ < 
2" and since I = [(1/A;) •log2 n\ (see line 6 of pseudocode), we have I < m + l < n. 
Consequently, Pr [F] > pm+i,y ■ {l/2n^/^). The proof is completed by showing 
that p„,+i.y > i-S^. This gives Pr[Y] > ^]-;Ci% = ^^^ > ^, if 
n> 2''. 

To calculate pm+i,y, we first note that since y e Rp, the requirement "y G 
RF,h,a reduces to "y e /i^i(a)". For a G {0,1}'" and y € {0,1}", we define 



(lm+i,y,a s& Pr \RF,h,a\ < 2n^/^ and h{y) — a : h -^ H{n,ra,r) , where r > 2. 

The proof is now completed by showing that qrn+i.y,a > (1 — n~^/^)/2™-^'^ for 
every a G {0,1}™ and y e {0,1}". Towards this end, we define an indicator 
variable ^y,a for every y e {0, 1}" and a G {0, 1}™ as follows: ^y^a = 1 if h{y) = 
a and 'yy^a = otherwise. Thus, "iy^a is a random variable with probability 
distribution induced by that of h. It is easy to show that (i) E Yly.a] = 2^™, and 
(ii) the pairwise independence of h implies pairwise independence of the ^y^a 
variables. We now define F^ ~ X^zgfl lz,a and ^y,a — E [F^ \ 'Yy,a = !]• Clearly, 
r^ = \RF,h,a\ and fiy^a = E [EzeflF "^^^^ I Ta>" = l] = E^efl^ E [7^,^ | 7j,,a = 1]. 
Using pairwise independence of the 7^,^ variables, the above simplifies to fiy^a = 
2-"'{\Rf\ - 1) + 1 < 2-^li^i.l + 1 = «!/'= + 1. From Markov's inequality, we 
know that Pr [Fa < n ■ fiy^a \ ly.a = 1] > 1 ^ 1/k for k > 0. With k — \x/k -, 

this gives Pr [ |i?F,?i,a| < 2v}l^ \ ^y_a == 1] > (1 — n^^/^)/2. Since h is chosen 
at random from H{n,m,r), we also have Pr [h{y) = a] = 1/2™. It follows that 

(7rr»+i,y,a > (1 - n"l/^-)/2"+^ □ 

Theorem 4. Assuming n > 2^ , algorithm UniWit succeeds (i.e. does not return 
_Lj with probability at least „. 

Proof. Let Pgucc denote the probability that a run of algorithm UniWit succeeds. 
By definition, Psucc = EyGfip Pi'[y]. Using TheoremE Psucc > EyGfl^ stfej 

One might be tempted to use large values of the parameter k to keep the 
value of pivot low. However, there are tradeoffs involved in the choice of k. As 
k increases, the pivot 2ri^/'"' reduces, and the chances that BoundedSAT finds 
more than 2n^^'' witnesses increases, necessitating further iterations of the loop 
in lines 7-12 of the pseudocode. Of course, reducing the pivot also means that 
BoundedSAT has to find fewer witnesses, and each invokation of BoundedSAT is 
likely to take less time. However, the increase in the number of invokations of 
BoundedSAT contributes to increased overall time. In our experiments, we have 
found that choosing k to be either 2 or 3 works well for all our benchmarks 
(including those containing several thousand variables). 

A heuristic optimization: A (near-)uniform generator is likely to be invoked 
a large number of times for the same formula F when generating a set of wit- 
nesses of F. If the performance of the generator is sensitive to problem-specific 
parameter(s) not known a priori, a natural optimization is to estimate values of 
these parameter(s), perhaps using computationally expensive techniques, in the 
first few runs of the generator, and then re-use these estimates in subsequent 
runs on the same problem instance. Of course, this optimization works only if 
the parameter(s) under consideration can be reasonably estimated from the first 
few runs. We call this heuristic optimization "leapfrogging" . 

In the case of algorithm UniWit, the loop in lines 7-12 of the pseudocode starts 
with i set to / — 1 and iterates until either i increments to n, or |i?F,h,a| becomes 
no larger than 2n^l^' . For each problem instance -F, we propose to estimate a 



lower bound of the value of i when the loop terminates, from the first few runs 
of UniWit on F. In all subsequent runs of UniWit on F, we propose to start 
iterating through the loop with i set to this lower bound. We call this specific 
heuristic "leapfrogging i" in the context of UniWit. Note that leapfrogging may 
also be used for the parameter s in algorithms XORSample' and XORSample (see 
pseudocode of XORSample'). We will discuss more about this in Sectional 

5 Experimental Results 

To evaluate the performance of UniWit, we built a prototype implementation and 
conducted an extensive set of experiments. Since our motivation stems primarily 
from functional verification, our benchmarks were mostly derived from functional 
verification of hardware designs. Specifically, we used "bit-blasted" versions of 
word-level constraints arising from bounded model checking of public-domain 
and proprietary word-level VHDL designs. In addition, we also used bit-blasted 
versions of several SMTLib [3] benchmarks of the "QF_BV/bruttomesso/ sim- 
ple-processor/" category, and benchmarks arising from "Type I" representations 
of ISCAS'85 circuits, as described in f9^. 

All our experiments were conducted on a high-performance computing clus- 
ter. Each individual experiment was run on a single node of the cluster, and the 
cluster allowed multiple experiments to run in parallel. Every node in the cluster 
had two quad-core Intel Xeon processors running at 2.83 GHz with 4 GB of phys- 
ical memory. We used 3000 seconds as the timeout interval for each invokation 
of BoundedSAT in UniWit, and 20 hours as the timeout interval for the overall 
algorithm. If an invokation of BoundedSAT in line 11 of the pseudocode timed 
out (after 3000 seconds), we repeated the iteration (lines 7-12 of the pseudocode 
of UniWit) without incrementing i. If the overall algorithm timed out (after 20 
hours), we considered the algorithm to have failed. We used either 2 or 3 for 
the value of the parameter k (see pseudocode of UniWit). This corresponds to 
restricting the pivot to few tens of witnesses for formulae with a few thousand 
variables. The exact values of k used for a subset of the benchmarks are indicated 
in Table [ij A full analysis of the effect of parameter k will require a separate 
study. As explained earlier, our implementation uses the family i/conD('^i "i, 2) 
to select random hash functions in step 9 of the pseudocode. 

For purposes of comparison, we also implemented and conducted experiments 
with algorithms BGP 5 , XORSample and XORSample' [12], using CryptoMin- 
iSAT as the SAT solver in all cases. Algorithm BGP timed out without producing 
any witness in all but the simplest of cases (involving less than 20 variables). This 
is primarily because checking whether |i?£!;,?i,a| < 2n^ for a given h G H(n,m,n) 
and for every a G {0, 1}™, as required in step 10 of algorithm BGP, is compu- 
tationally prohibitive for values of n and m exceeding few tens. Hence, we do 
not report any comparison with algorithm BGP. Of the algorithms XORSample 
and XORSample', algorithm XORSample' consistently out-performed algorithm 
XORSample in terms of both actual time taken and uniformity of generated 
witnesses. This can be largely attributed to the stringent requirement that al- 
gorithm XORSample be provided a parameter s that renders the model count 



of the input formula F constrained with s random xor constraints to exactly 1. 
Our experiments indicated that it was extremely difficult to predict or leapfrog 
the range of values for s such that it met the strict requirement of the model 
count being exactly 1. This forced us to expend significant computing resources 
to estimate the right value value for s in almost every run, leading to huge perfor- 
mance overheads. Since algorithm XORSample' consistently outperformed algo- 
rithm XORSample, we focus on comparisons with only algorithm XORSample' in 
the subsequent discussion. Note that our benchmarks, when viewed as Boolean 
circuits, had upto 695 circuit inputs, and 21 of them had more than 95 inputs 
each. While UniWit and XORSample' completed execution on all these bench- 
marks, we could not build ROBDDs for 18 of the above 21 benchmarks within 
our timeout limit and with 4GB of memory. 

Table [T] presents results of our experiments comparing performance and uni- 
formity of generated witnesses for UniWit and XORSample' on a subset of bench- 
marks. The tool and the complete set of results on over 200 benchmarks are avail- 
able at http : //www . cf dvs . iitb . ac . in/reports/reports/CAV13/ , The first 
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Benchmark 
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Clauses 


k 
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(i) 


Average 
Run Time (s) 


Var- 
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Average 
Run Time (s) 


Var- 
iance 


case_3_bl4 


779 


2480 


2 
3 


[34,35] 
[36,37] 


49.29+5.27 
19.32+1.44 


1.58 


15061.85+59.31 


3.47 


case_2_bl4 


519 


1607 


3 


[38,39] 


22.13+2.09 


0.57 


18005.58+0.73 


9.51 


case203 


214 


580 


3 


[42,44] 


16.41+1.04 


8.98 


18006.85+2.78 


230.5 


casel45 


219 


558 


3 


[42,44] 


19.84+1.42 


1.62 


18007.18+2.99 


2.32 


casel4 


270 


717 


2 


[44,45] 


54.07+2.33 


0.65 


18004.8+0.9 


28.16 


case61 


289 


773 


3 


[44,46] 


30.39+5.49 


1.33 


18009.1+4.4 


11.92 


case9 


302 


821 


3 


[45,47] 


25.64+1.54 


2.07 


18004.79+0.87 


46.15 


caselO 


351 


946 


2 


[60,61] 


204.93+17.99 


6.1 


18008.42+4.85 


10.56 


casel5 


319 


842 


3 


[61,63] 


91.84+14.64 


0.82 


18008.34+5.08 


11.04 


casel40 


488 


1222 


3 


[99,101] 


288.63+23.53 


3.4 


21214.85+200.64 


6.71 


squaring 14 


5397 


18141 


3 


[28,30] 


2399.19+1243.81 




7089.6+2088.46 




squaring7 


5567 


18969 


3 


[26,29] 


2358.45+1720.49 




4841.4+2340.84 




case39 


590 


1789 


2 


[50,50] 


710.65+85.22 




18159.12+138.22 




case_2_ptb 


7621 


24889 


3 


[72,73] 


1643.2+225.41 




22251.8+177.61 




case_l_ptb 


7624 


24897 


2 
3 


[70,70] 
[72,73] 


17295.45+454.64 
1639.16+219.87 




22346.64+204.07 





Table 1. Performance comparison of UniWit and XORSample' 



three columns in Table IT] give the name, number of variables and number of 
clauses of the benchmarks represented as CNF formulae. The columns grouped 
under UniWit give details of runs of UniWit, while those grouped under XORSample' 
give details of runs of XORSample'. For runs of UniWit, the column labeled "fc" 
gives the value of the parameter k used in the corresponding experiment. The 



column labeled "Range (i)" shows the range of values of i when the loop in lines 
7-12 of the pseudocode (see Section [4]) terminated in 100 independent runs of 
the algorithm on the benchmark under consideration. Significantly, this range is 
uniformly narrow for all our experiments with UniWit. As a result, leapfrogging 
i is very effective for UniWit. 

The column labeled "Run Time" under UniWit in Table [l] gives run times 
in seconds, separated as timei + time2, where timei gives the average time 
(over 100 independent runs) to obtain a witness and to identify the lower bound 
of i for leapfrogging in later runs, while time2 gives the average time to get 
a solution once we leapfrog i. Our experiments clearly show that leapfrogging 
i reduces run-times by almost an order of magnitude in most cases. We also 
report "Run Time" for XORSample', where times are again reported as timei + 
time2- In this case, timei gives the average time (over 100 independent runs) 
taken to find the value of the parameter s in algorithm XORSample' using a 
binary search technique, as outlined in a footnote in 12 . As can be seen from 
Table [l] this is a computationally expensive step, and often exceeds timei under 
UniWit by more than two to three orders of magnitude. Once the range of the 
parameter s is identified from the first 100 independent runs, we use the lower 
bound of this range to leapfrog s in subsequent runs of XORSample' on the same 
problem instance. The values oitime2 under "Run Time" for XORSample give 
the average time taken to generate witnesses after leapfrogging s. Note that 
the difference between time2 values for UniWit and XORSample' algorithms is 
far less pronounced than the difference between timei values. In addition, the 
timei values for XORSample' are two to four orders of magnitude larger than the 
corresponding tim,e2 values, while this factor is almost always less than an order 
of magnitude for UniWit. Therefore, the total time taken for rii runs without 
leapfrogging, followed by n2 runs with leapfrogging for XORSample' far exceeds 
that for UniWit, even for ni = 100 and n2 ~ 10^. This illustrates the significant 
practical efficiency of UniWit vis-a-vis XORSample'. 

Table [I] also reports the scaled statistical variance of relative frequencies of 
witnesses generated by 5 x 10^ runs of the two algorithms on several benchmarks. 

The scaled statistical variance is computed as j^^ ^ ( /« ^ ( n^ ) ) ' where 

N denotes the number of distinct witnesses generated, fi denotes the relative 
frequency of the «*'' witness, and K (10^°) denotes a scahng constant used to 
facilitate easier comparison. The smaller the scaled variance, the more uniform 
is the generated distribution. Unfortunately, getting a reliable estimate of the 
variance requires generating witnesses from runs that sample the witness space 
sufficiently well. While we could do this for several benchmarks (listed towards 
the top of Table [l]), other benchmarks (listed towards the bottom of Table [l]) had 
too large witness spaces to conduct these experiments within available resources. 
For those benchmarks where we have variance data, we observe that the variance 
obtained using XORSample' is larger (by upto a factor of 43) than those obtained 
using UniWit in almost all cases. Overall, our experiments indicate that UniWit 
always works significantly faster and gives more (or comparably) uniformly dis- 



tributed witnesses vis-a-vis XORSample' in almost all cases. We also measured 
the probability of success of UniWit for each benchmark as the ratio of the num- 
ber of runs for which the algorithm did not return _L to the total number of runs. 
We found that this exceeded 0.6 for every benchmark using UniWit. 
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Fig. 1. Sampling by UniWit (k=2) 



Fig. 2. Sampling by XORSample' 



As an illustration of the difference in uniformity of witnesses generated by 
UniWit and XORSample', Figures [I] and [2] depict the frequencies of appearance of 
various witnesses using these two algorithms for an input CNF formula (casellO) 
with 287 variables and 16, 384 satisfying assignments. The horizontal axis in each 
figure represents witnesses numbered suitably, while the vertical axis represents 
the generated frequencies of witnesses. The frequencies were obtained from 10.8 x 
10® successful runs of each algorithm. Interestingly, XORSample' could find only 
15,612 solutions (note the empty vertical band at the right end of Figure [2]), 
while UniWit found all 16, 384 solutions. Further, XORSample' generated each of 
15 solutions more than 5, 500 times, and more than 250 solutions were generated 
only once. No such major deviations from uniformity were however observed in 
the frequencies generated by UniWit. We also found that 15624 out of 16384 (i.e. 
95.36%) witnesses generated by UniWit had frequencies in excess of Nunif/8, 
where Numf = 10.8 x 10^/16384 « 659. In contrast, only 6047 (i.e. 36.91%) 
witnesses generated by XORSample' had frequencies in excess of Nunif/8. 

6 Concluding Remarks 

We described UniWit, an algorithm that near-uniformly samples random wit- 
nesses of Boolean formulas. We showed that the algorithm scales to reasonably 
large problems. We also showed that it performs better, in terms of both run 
time and uniformity, than previous best-of-breed algorithms for this problem. 
The theoretical guarantees can be further improved with higher independence of 
the family of hash functions used in UniWit (see http : //www . cf dv s . iitb . ac 7\ 
in/reports/reports/CAV13/ for details). 



We have yet to fully explore the parameter space and the effect of pseudo- 
random generators other than HotBits for UniWit. There is a trade off between 



failure probability, time for first witness, and time for subsequent witnesses. Dur- 
ing our experiments, we observed the acute dearth of benchmarks available in 
the public domain for this important problem. We hope that our work will lead 
to development of benchmarks for this problem. Our focus here has been on 
Boolean constraints, which play a prominent role in hardware design. Extending 
the algorithm to handle user-provided biases would be an interesting direction of 
future work. Yet another interesting extension would be to consider richer con- 
straint languages and build a uniform generator of witnesses modulo theories, 
leveraging recent progress in satisfiability modulo theories, c.f., [10) . 
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